AD-A277  526 

■ililiili 


'ATION  PAa&roved  for  pub  I ic  rele^f  tTo7M.oiaa  (ok 

to  I  ^our  per  response,  mcivatng  (^e  nme  lor  reviewing  instructions,  sejrcnmg  y  ^stmg  oete  screes, 

wing  the  collection  of  mlormeiion.  Send  comments  reoerdmg  this  burden  estimate  or  an?  other  aspect  of  this 
tn.  to  Washington  HeMQuerters  Services.  Directorate  for  information  Operations  and  Keoons.  UtS  iefferson 
e  of  Management  and  Budget.  Paperwork  Reduction  Rrojea  (070441  SB).  Washington.  DC,  20S33. 


DATE 

1994 


3.  REPORT  TYPE  AND  DATES  COVERED  ,  , 

annual  technical  5A/92 — 4/30/93 


4.  TITLE  Af!D  SUBTITLE 

A  self ‘-organizing  neural  network  architecture  for  auditory 
and  speech  perception  with  applications  to  acoustic  and 
other  temporal  prediction  problems 
6.  AUTHOR(S) 

Stephen  Grossberg  and  Michael  Cohen 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Boston  University 

Center  for  Adaptive  Systems 

and 

Department  of  Cognitive  and  Neural  Systems 
Boston^  MA  02215 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Air  Force  Office  of  Scientific  Research 
Bolling  AFB,  DC  20332 


11.  SUPPLEMENTARY  NOTES 


12a.  DISTRIBUTION /AVAILABILITY  STATEMENT! 


ItLECTEj 

lMAR28J994J 


5.  FUNDING  NUMBERS 

F49620-92-J-0225 
\  I  C? 

S 

B.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

FOSR-m-  9  4  0 1  ^ 


10.  SPONSORING /MONITORING 
AGENCY  REPORT  NUMBER 


illilfli 


O'i:  I 

AC'O 


12b.  DISTRIBUTION  CODE 


for  -  c  rcleaso  ; 
c  L  >  ■■  1  lut  i  or  rr  '  :  ^  t  od  . 


(M 


13.  ABSTRACT  (Maximum  200  words) 

This  project  is  developing  autonomous  neural  network  models  for  the  real-time  per¬ 
ception  and  production  of  acoustic  and  speech  signals.  A  new  acoustic  filter  was 
developed  to  show  how  coarticulated  context-sensitive  auditory  signals  can  be  separ¬ 
ated  and  represented  in  a  more  context-independent  fashion,  thereby  easing  the  recog¬ 
nition  problem.  Parallel  processing  streams  sensitive  to  sustained  and  transient  sig- 
nals  are  used,  as  in  vision.  A  model  of  working  memory  was  developed  that  automati-  j 
cally  compensates  for  variable  acoustic  or  speech  rates.  The  model  shows  how  invari¬ 
ance  of  the  short  term  storage  of  variable-rate  acoustic  streams  can  explain  data 
about  categorical  boundary  shifts  when  the  distributions  of’  silent  intervals  or  of 
vowel  durations  are  altered.  New  learning  and  categorization  nets  were  shown  to  dis¬ 
criminate  vowels  with  comparable  accuracy  but  much  higher  compression  than  alternative 
methods.  Models  of  skilled  motor  control  were  developed  to  clarify  how  speech  and  arm 
movements  can  be  planned  and  flexibly  modified  by  task  requirements.  Studies  of  neural 
oscillators  suggest  how  rhythmic  behaviors  relevant  to  perception  and  action,  notably 
synchronous  oscillations,  may  be  generated  and  controlled. 


U.  SUBJEa  TERMS 

15.  NUMBER  OF  PAGES 

13  pages 

16.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 
OF  REPORT 

unclassified 

18.  SECURITY  OSSIFICATION 
OF  THIS  PAGE 

unclassified 

19.  SECURITY  CLASSIFICATION 
OF  ASSTRAa 

20.  LIMITATION  OF  PBSTRAI 

NSN  7S40-0l-2tk>-$S00 


Standard  Porm  298  (Rev  2-89) 

ernenbve  by  anv  Std 

m-102 


r 


Approved  for  public  release  j 

distri'^  '  T  on  nnl.lr.i  t  qJ, 

•..v.;v.  DlvY 


PUBLICATIONS  PARTIALLY  SUPPORTED  BY 
THE  AIR  FORCE  OFFICE  OF  SCIENTIFIC  RESEARCH 

Contract  AFOSR  F49620-92-J-0225 

MAY  1,  1992— APRIL  30,  1993 

Center  for  Adaptive  Systems 
and 

Department  of  Cognitive  and  Neural  Systems 
Boston  University 


1.  Boardman,  I.,  Cohen,  M.A.,  and  Grossberg,  S.  (1993).  Variable  rate  working  memo¬ 
ries  for  phonetic  categorization  and  invariant  speech  perception.  Technical  Report 
CAS/CNS-TR-93-008,  Boston  University.  In  Proceedings  of  the  world  congress 
on  neural  networks,  Portland,  III,  2-5.  Hillsdale,  N.J:  Erlbaum  Associates.  (%&) 

2.  Bradski,  G.,  Carpenter,  G.A.,  and  Grossberg,  S.  (1992).  STORE  working  memory 
networks  for  storage  and  recall  of  arbitrary  temporal  sequences.  Technical  Report 
CAS/CNS-TR-92-028,  Boston  University.  Submitted  for  publication.  {*^%+Sz) 

3.  Bradski,  G.  and  Cohen,  M.A.  (1993).  A  preliminary  look  at  a  fast  learning  architecture 
for  speaker  independent  speech  recognition.  In  Proceedings  of  the  world  congress 
on  neural  networks,  Portland,  III,  37-39.  Hillsdale,  N.J:  Erlbaum  Associates,  (-f-©) 

4.  Bullock,  D.,  Grossberg,  S.,  and  Mannes,  C.  (1993).  The  VITEWRITE  model  of  hand¬ 
writing  production.  Technical  Report  CAS/CNS-TR-93-011,  Boston  University. 
In  Proceedings  of  the  world  congress  on  neural  networks,  Portland,  I,  507-511. 
Hillsdale,  N.J:  Erlbaum  Associates.  (%-!-&) 

5.  ('arpenter,  G.A.  and  Govindarajan,  K.K.  (1993).  Evaluation  of  speaker  normaliza¬ 
tion  methods  for  vowel  recognition  using  fuzzy  ARTMAP  and  K-NN.  Technical  Re¬ 
port  CAS/CNS-TR-93-013,  Boston  University.  In  Proceedings  of  the  world 
congress  on  neural  networks,  Portland,  III,  10-15.  Hillsdale,  N.J:  Erlbaum  Asso¬ 
ciates.  (#%-!-&;) 

6.  Carpenter,  G.A.  and  Govindarajan,  K.K.  (1993).  Speaker  normalization  methods  for 
vowel  recognition:  Comparative  analysis  using  neural  network  and  nearest  neighbor 
classifiers.  Technical  Report  CAS/CNS-TR-93-039,  Boston  University.  Submitted 
for  publication.  (%:j(^-|-(d) 

7.  Carpenter,  G.A.  and  Grossberg,  S.  (1993).  Integrating  symbolic  and  neural  processing  in 
a  self-organizing  architecture  for  pattern  recognition  and  prediction.  Technical  Report 
CAS/CNS-TR-93-002,  Boston  University.  In  V.  Honavar  and  L.  Uhr  (Eds.),  Inte¬ 
grating  symbol  processors  connectionist  networks  for  artificial  intelligence 
and  cognitive  modelling.  New  York:  Academic  Press.  {#%+k) 

8.  Cohen,  M.A.  (1993).  Neural  network  models  of  speech  and  language  perception  and 
recognition.  In  Proceedings  of  the  world  congress  on  neural  networks,  Portland, 
III,  1.  Hillsdale,  N.J:  Erlbaum  Associates. 

9.  Cohen,  M.A.,  Grossberg,  S.,  and  Pribe,  C.  (1993).  Neural  control  of  interlimb  coordina¬ 
tion  and  gait  timing  in  bipeds  and  quadrupeds.  Technical  Report  CAS/CNS-TR- 
93-004,  Boston  University.  Submitted  for  publication.  (((*%&) 


DTIO 


1 


1 


10.  Cohen,  M.A.,  Grossberg,  S.,  and  Pribe,  C.  (1993).  A  neural  pattern  generator  that 
exhibits  arousal-dependent  human  gait  transitions.  Technical  Report  CAS/CNS- 
TR-93-017,  Boston  University.  In  Proceedings  of  the  world  congress  on  neural 
networks,  Portland,  IV,  285-288.  Hillsdale,  N.J:  Erlbaum  Associates.  (’^%-|-&) 

11.  Cohen,  M.A.,  Grossberg,  S.,  and  Pribe,  C.  (1993).  Frequency-dependent  phase  transi¬ 
tions  in  the  coordination  of  human  bimanual  tasks.  Technical  Report  CAS/CNS- 
TR-93-018,  Boston  University.  In  Proceedings  of  the  world  congress  on  neural 
networks,  Portland,  IV,  491-494.  Hillsdale,  NJ:  Erlbaum  Associates.  (’*'%-!-&;) 

12.  Cohen,  M.A.,  Grossberg,  S.,  and  Pribe,  C.  (1993).  Quadruped  gait  transitions  from 
a  neural  pattern  generator  with  arousal  modulated  interactions.  Technical  Report 
CAS/CNS-TR-93-019,  Boston  University.  In  Proceedings  of  the  world  congress 
on  neural  networks,  Portland,  II,  610-613.  Hillsdale,  NJ:  Erlbaum  Associates.  (*%-f<C) 

13.  Cohen,  M.A.,  Grossberg,  S.,  and  Wyse,  L.  (1992).  A  neural  network  spectral  model  of 
pitch  detection  and  representation.  Submitted  for  publication.  (%&) 

14.  Grossberg,  S.  (1993).  Self-organizing  neural  networks  for  stable  control  of  autonomous 
behavior  in  a  changing  world.  In  J.G.  Taylor  (Ed.),  Mathematical  approaches  to 
neural  networks.  Amsterdam:  Elsevier  Science  Publishers,  pp.  139-197.  (%&) 

15.  Grossberg,  S.  and  Grunewald,  A.  (1993).  Statistical  properties  of  single  and  competing 
nonlinear  fast-slow  oscillations  in  noise.  Technical  Report  CAS/CNS-TR-93-022, 
Boston  University.  In  Proceedings  of  the  world  congress  on  neural  networks, 
Portland,  TV,  303-307.  Hillsdale,  NJ:  Erlbaum  Associates.  (%-(-&) 


*  Also  supported  in  part  by  the  AFOSR  URI. 

(c*  Also  supported  in  part  by  the  Army  Research  Office. 

%  Also  supported  in  part  by  ARPA. 

#  Also  supported  in  part  by  British  Petroleum. 

+  Also  supported  in  part  by  the  National  Science  Foundation. 
&  Also  supported  in  part  by  the  Office  of  Naval  Research. 


Aooeaslon  for  ‘  | 

'sili 

DTIC  T-AB 
UnartnciTCced 
1.10 

bT 

□ 

□ 

n 

r'l 

5  al 

2 


RESEARCH  SUMMARIES 


Pitch  Perception 

One  of  the  major  human  auditory  abilities  is  the  effortless  identification  of  multiple 
speakers  while  simultaneously  understanding  the  content  of  their  speech.  This  content  is  of 
course  speaker  independent.  There  is  considerable  evidence  that  an  important  cue  for  the 
segregation  of  speech  by  speaker  is  the  pitch  of  the  utterance. 

C’ohen,  Grossberg,  and  Wyse  have  developed  a  neural  model  of  pitch  perception,  and 
have  submitted  this  work  for  publication.  This  model  provides  a  unified  explanation  of 
the  key  experimental  data  on  pitch  perception  and  is  part  of  our  theory  of  auditory  object 
recognition.  The  model  computes  a  spatial  representation  of  pitch  as  one  of  several  steps 
in  our  development  of  a  theory  of  auditory  source  segregation  and  streaming.  Simulated 
data  include  pitch  perception  with  mistuned  or  missing  components,  shifted  harmonics,  and 
various  types  of  noise;  the  auditory  dominance  region;  octave  shifts  to  ambiguous  stimuli;  and 
Shepard  auditory  barberpole  illusions.  The  model  provides  a  basis  for  linking  featural  and 
spatial  information  to  rapidly  locate  and  bind  together  the  auditory  signals  that  correspond 
to  a  target  in  space.  Research  on  such  target  localization  has  begun,  as  noted  in  the  next 
paragraph.  [Article  13] 

Design  of  Variable-Rate  Working  Memories  and  Categorization  Networks 

An  important  problem  in  speech  recognition  is  the  identification,  selection,  and  coherent 
grouping  of  speech  units  from  a  continuously  fluctuating  acoustic  stream.  Of  particular 
importance  in  this  process  is  the  influence  of  current  auditory  context  on  prior  representations 
which  have  already  been  partially  constructed.  Most  models  of  speech  recognition  work 
strictly  forward  in  time,  and  have  considerable  difficulty  in  dealing  with  such  backwards 
effects,  especially  when  speech  rate  is  variable.  A  diagnostic  set  of  data  from  Repp,  among 
others,  shows  that  the  prior  distribution  of  silence  in  the  speech  waveform  plays  a  critical 
role  in  determining  whether  a  single  stop  consonant  (e.g.  /aga/)  or  a  stop  consonant  cluster 
(e.g.  /agpa/)  is  heard.  These  data  probe  the  conditions  under  which  phonetic  boundaries 
are  perceived.  They  illustrate  that  the  speech  code  is  not  derived  directly  from  bottom-up 
filtered  data,  but  is,  rather,  an  emergent  property  of  bottom-up  interactions  with  learned 
top-down  expectations. 

This  project  begins  to  model  the  interaction  between  a  working  memory  that  temporarily 
stores  speech  tokens  and  its  feedback  interactions  with  a  speech  categorization  network. 

The  first  project  showed  how  a  working  memory  could  be  designed  such  that  all  possible 
groupings  of  stored  items  could  be  learned  in  a  stable  fashion,  even  if  a  grouping  was  later 
stored  and  learned  as  part  of  a  larger  grouping.  Thus  the  word  MY  is  not  forgotten  when 
it  is  later  learned  as  part  of  the  word  MYbELh.  I  his  was  shown  for  arbitrary  temporal 
streams  of  data,  occurring  with  arbitrary  rates,  delays,  and  repetitions. 

Next  it  was  shown  how  such  a  working  memory  could  be  modified  such  that  its  integration 
rate  automatically  adjusts  to  generate  a  representation  that  is  independent  of  the  input 
rate.  This  invariance  property  is  used  to  explain  key  properties  of  the  Repp  data.  Thus  the 
variability  of  categorical  recognition  boundaries  can  be  derived  from  the  invariance  of  the 
working  memory  code  in  response  to  variable-rate  acoustic  signals. 


.3 


The  model  also  clarifies  other  backward  effects  in  speech,  such  as  when  a  stop  (e.g. 
/ba/)  percept  is  transformed  into  a  glide  (e.g.  /wa/)  percept  by  varying  the  duration  of  the 
subsequent  vowel.  [Articles  1  and  2] 

Speech  Filtering  and  Coarticulation 

The  rate  of  reception  of  normal  spoken  language  places  severe  demands  on  the  sensory, 
motor,  and  cognitive  mechanisms  which  control  the  speech  production  process.  The  vocal 
musculature  often  cannot  keep  up  with  the  rate  of  transmission  of  spoken  speech.  Nature’s 
solution  to  the  motoric  problem  is  to  overlap  significant  parts  of  gestural  motion  for  adjacent 
speech  stops  and  consonants,  a  phenomenon  called  coarticulation. 

This  coarticulation  creates  corresponding  perceptual  problems,  because  the  acoustic 
stream  for  the  same  phonetic  percept  is  of  necessity  context-dependent.  An  early  stage 
of  auditory  neural  processing  must  be  able  to  separate  and  represent  these  segments  in  a 
context-independent  fashion  to  disambiguate  these  coarticulations. 

Parallel  processing  streams  that  are  sensitive  to  sustained  and  transient  features  are 
suggested  to  play  such  a  role  in  speech  perception.  Analogous  parallel  streams  are  known 
to  occur  in  vision.  We  are  preparing  a  paper  which  models  these  auditory  sustained  and 
transient  mechanisms  and  shows  how  their  combination  can  disambiguate  sonorant  (vowels 
[/i/],  nasal  [m,n],  and  glides  [y,w])  from  non-sonorant  (stops  /t/  and  fricatives  /s,z/)  phonetic 
classes.  Furthermore,  these  detectors  can  distinguish  between  the  major  subcategories  of 
these  classes.  This  work  hereby  proposes  a  partial  description  of  how  the  human  auditory 
apparatus  solves  the  perceptual  problem  caused  by  coarticulation.  Later  processing  is  greatly 
simplified  by  this  processing  stage. 

Speaker  Normalization  in  Speech 

A  total  of  32  intrinsic  and  128  extrinsic  methods  of  vowel  normalization  have  been  anal¬ 
ysed  on  the  Peterson-Barney  database,  using  the  K-Nearest  Neighbor  and  Fuzzy  ARTMAP 
pattern  classifiers  using  10  nearest  neighbors  throughout.  A  total  of  128  extrinsic  and  28 
extrinsic  methods  were  used  in  the  classification. 

Fuzzy  ARTMAP  is  a  self-organizing  neural  network  classifier,  developed  recently  in  our 
group  by  Professors  Carpenter  and  Grossberg  with  several  PhD  students.  It  is  capable  of  real¬ 
time  hypothesis  testing  to  discover  and  learn  recognition  categories  whose  number,  shape, 
and  size  adapt  to  the  statistics  of  an  arbitrary  nonstationary  data  stream.  In  this  preliminary 
study,  K-NN  performed  slightly  better  than  Fuzzy  ARTMAP  because  it  exhaustively  stores 
all  the  data.  K-NN  also  needed  10  times  as  much  memory  to  gain  this  slight  advantage. 
The  much  improved  compression  will  be  of  great  importance  in  large  database  problelms. 
More  powerful  ART  systems  also  promise  to  achieve  better  performance  and  compression. 
I'he  optimal  intrinsic  normalization  scheme  was  Bark  differences  in  frequency.  The  optimal 
extrinsic  normalization  scheme  involved  choosing  a  best  linear  transformation  of  the  data. 
[Article  5] 

Handwriting  Production 

The  psychophysical  properties  of  speech  articulator  movements  and  arm  reaching  move¬ 
ments  share  many  common  features.  Models  of  skilled  arm  movement  control  thus  promise 


4 


to  shed  light  on  how  spoken  language  is  generated.  This  knowledge,  in  turn,  can  help  to 
clarify  how  articulatory  constraints  may  influence  the  sppeech  perception  process. 

The  VITEWRITE  model  of  Bullock,  Grossberg,  and  Mannes  suggests  how  complex 
handwritten  movements  may  be  flexibly  generated  with  different  sizes,  speeds,  and  styles. 
To  achieve  this  competence,  it  is  shown  how  a  suitably  designed  working  memory  interacts 
via  nonlinear  feedback  with  a  multijoint  trajectory  generator  that  is  modulated  by  volitional 
scalar  GO  (speed)  and  GRO  (size)  signals.  It  is  also  shown  how  a  redundant  manipulator 
simplifies  the  control  problem.  A  new  concept  of  motor  program  is  hereby  developed  for 
the  control  of  multiple  motor  constraints.  Neural  data  about  frontal  cortex,  motor  cortex, 
parietal  cortex,  and  basal  ganglia  are  clarified  by  the  model.  Psychophysical  data  about 
handwriting  control,  such  as  the  isochrony  principle,  asymmetric  velocity  profiles,  and  the 
two  thirds  power  law  relating  velocity  to  curvature  arise  as  emergent  properties  of  model 
interactions.  [Article  4] 

Central  Pattern  Generators 

Rhythmic  processes  are  important  in  acoustical  and  speech  perception  and  production.  A 
class  of  central  pattern  generator  models  was  developed  to  explain  behavioral  data  about  hu¬ 
man  and  animal  gaits  and  their  transitions.  Simulated  data  include  all  the  cat  gait  transitions 
(walk-trot-pace-gallop),  the  human  walk-run  transition,  and  the  transition  from  anti-phase 
movements  to  synchronous  in-phase  movements  during  increasingly  rapid  finger  oscillations. 
These  analyses  clarified  how  a  volitional  GO,  or  speed,  signal  can  interact  with  suitably 
designed  feedback  networks  to  generate  only  the  desired  gaits  and  transitions,  quickly  and 
stably.  They  also  clarified  how  switches  from  either  anti-pha^se  to  in-phase,  or  in-phase  to 
anti-phase  movements  can  be  generated  as  the  GO  signal  increases  in  different  parameter 
ranges  of  a  single  network.  [Articles  9-12] 

Synchronous  Dynamics  of  Nonlinear  Oscillators 

In  both  speech  and  neural  processing,  a  role  for  synchronous  binding  of  spatially  dis¬ 
tributed  features  into  coherent  perceptual  units  has  been  psychophysically  and  neurally 
reported.  Ths  study  modeled  how  neural  networks  could  be  designed  to  achieve  rapid  syn¬ 
chronization  of  distributed  data  even  in  high  levels  of  cellular  noise.  It  was  also  shown  how 
stochastic  resonance  could  arise  in  the  model,  thereby  improving  its  signal-to-noise  ratio  in 
noise.  [Article  15] 


.5 


SELECTED  ABSTRACTS 


6 


VARIABLE  RATE  WO/^  KING  MEMORIES  FOR  PHONETIC 
CATEGORIZATION  AND  INVARIANT  SPEECH  PERCEPTION 

Ian  Boardman,  Michael  Cohen,  and  Stephen  Crossberg 

Technical  Report  CAS/CNS-TR-93-008,  Boston  University 
In  Proceedings  of  the  World  Congress  on  Neural  Networks 
Hillsdale,  NJ:  Erlbaum  Associates,  1993,  III,  pp.  2-5 


Abstract 

Speech  can  be  understood  at  widely  varying  production  rates.  A  working  memory  is 
described  for  short-term  storage  of  temporal  lists  of  input  items.  The  working  memory  is 
a  cooperative-competitive  neural  network  that  automatically  adjusts  its  integration  rate,  or 
gain,  to  generate  a  short-term  memory  code  for  a  list  that  is  independent  of  item  presentation 
rate.  Such  an  invariant  working  memory  model  is  used  to  simulate  data  of  Repp  (1980) 
concerning  the  changes  of  phonetic  category  boundaries  as  a  function  of  their  presentation 
rate.  Thus  the  variability  of  categorical  boundaries  can  be  traced  to  the  temporal  invariance 
of  the  working  memory  code. 


Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  F49620-92- 
.1-0225  and  AFOSR  90-0128),  ARPA  (ONR  N00014-92-.1-4015),  and  the  Office  of  Naval 
Research  (ONR  N00014-91-.I-4100). 


7 


r 


STORE  WORKING  MEMORY  NETWORKS  FOR  STORAGE 
AND  RECALL  OF  ARBITRARY  TEMPORAL  SEQUENCES 

Gary  Bradskif,  Gail  A.  GarpenterJ,  and  Stephen  Grossberg§ 

Technical  Report  CAS/CNS-TR-92-028,  Boston  University 
Submitted  to  Biological  Cybernetics 


Abstract 

Neural  network  models  of  working  memory,  called  Sustained  Temporal  Order  REcurrent 
(STORE)  models,  are  described.  They  encode  the  invariant  temporal  order  of  seciuential 
events  in  short  term  memory  (STM)  in  a  way  that  mimics  cognitive  data  about  working 
memory,  including  primacy,  recency,  and  bowed  order  and  error  gradients.  As  new  items 
are  presented,  the  pattern  of  previously  stored  items  is  invariant  in  the  sense  that  relative 
activations  remain  constant  through  time.  This  invariant  temporal  order  code  enables  all 
possible  groupings  of  sequential  events  to  be  stably  learned  and  remembered  in  real  time, 
even  as  new  events  perturb  the  system.  Such  a  competence  is  needed  to  design  self-organizing 
temporal  recognition  and  planning  systems  in  which  any  subsequence  of  events  may  need  to 
be  categorized  in  order  to  control  and  predict  future  behavior  or  external  events.  STORE 
models  show  how  arbitrary  event  sequences  may  be  invariantly  stored,  including  repeated 
events.  A  preprocessor  interacts  with  the  working  memory  to  represent  event  repeats  in 
spatially  separate  locations.  It  is  shown  why  at  least  two  processing  levels  are  needed  to 
invariantly  store  events  presented  with  arbitrary  durations  and  interstimulus  intervals.  It 
is  also  shown  how  network  parameters  control  the  type  and  shape  of  primacy,  recency,  or 
bowed  temporal  order  gradients  that  will  be  stored. 


t  .Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  90-0128)  and 
the  Office  of  Naval  Research  (ONR  NOOO 14-91 -.1-4 100  and  ONR  N00014-92-.J-1309). 

t  Supported  in  part  by  British  Petroleum  (BP  89A-1204),  ARPA  (AFOSR  90-0083  and 
ONR  N00014-92-.I-4015),  the  National  Science  Foundation  (NSF  IRI-90-00530),  and  the 
Office  of  Naval  Research  (ONR  N00014-91-.1-4100). 

§  Supported  in  part  by  the  Air  Force  Office  of  .Scientific  Research  (AFOSR  F49620-92-.J- 
022.^),  ARPA  (AFOSR  90-0083  and  ONR  N00014-92-.1-4015)  and  the  Office  of  Naval  Re¬ 
search  (ONR  NOOO  14-91 -.1-4 100  and  ONR  N00014-92-.J-1.309). 


8 


EVALUATION  OF  SPEAKER  NORMALIZATION  METHODS 
FOR  VOWEL  RECOGNITION  USING  FUZZY  ARTMAP  AND  K-NN 

Gail  A.  Carpentert  and  Krishna  K.  GovindarajanJ 

Technical  Report  CAS/CNS-TR-93  013,  Boston  University 
In  Proceedings  of  the  World  Congress  on  Neural  Networks 
Hillsdale,  NJ:  Erlbaum  Associates,  1993,  III,  pp.  10-1" 


Abstract 

A  procedure  that  uses  fuzzy  ARTMAP  and  K-Nearest  Neighbor  (K-NN)  categorizers  to 
evaluate  intrinsic  and  extrinsic  speaker  normalization  methods  is  described.  Each  classifier 
is  trained  on  preprocessed,  or  normalized,  vowel  tokens  from  about  30%  of  the  speakers  of 
the  Peterson-Barney  database,  then  tested  on  data  from  the  remaining  speakers.  Intrinsic 
normalization  methods  included  one  nonscaled,  four  psychophysical  scales  (bark,  bark  with 
end-correction,  mel,  ERB),  and  three  log  scales,  each  tested  on  four  different  combinations  of 
the  fundamental  (fo)  and  the  formants  (F],  F2,  E3).  For  each  scale  and  frequency  combina¬ 
tion,  four  extrinsic  speaker  adaptation  schemes  were  tested:  centroid  subtraction  across  all 
frequencies  (GS),  centroid  subtraction  for  each  frequency  (CSi),  linear  scale  (LS),  and  linear 
transformation  (LT).  A  total  of  32  intrinsic  and  128  extrinsic  methods  were  thus  compared. 
Fuzzy  ARTMAP  and  K-NN  showed  similar  trends,  with  K-NN  performing  somewhat  better 
and  fuzzy  ARTMAP  requiring  about  1/10  as  much  memory.  The  optimal  intrinsic  normal¬ 
ization  method  was  bark  scale,  or  bark  with  end-correction,  using  the  differences  between 
all  frequencies  (Diff  All).  The  order  of  performance  for  the  extrinsic  methods  was  LT,  (’Si, 
LS,  and  GS,  with  fuzzy  ARTMAP  performing  best  using  bark  scale  with  Diff  All;  and  K-NN 
choosing  psychophysical  measures  for  all  except  GSi. 


t  Supported  in  part  by  British  Petroleum  (BP  89A-1204),  ARPA  (AF(3SR  (90-0083  and 
ONR  N00014-92-J-4015),  the  National  Science  Foundation  (NSF  IRI-90-00530),  and  the 
Office  of  Naval  Research  (ONR  N00014-91-.I-4100). 

t  Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  F49620-92-.1- 
0225),  ARPA  (ONR  N00014-92-.J-4015),  and  the  National  Science  Foundation  (NSF  IRI-90- 
00530). 


9 


NEURAL  CONTROL  OF  INTERLIMB  COORDINATION 
AND  GAIT  TIMING  IN  BIPEDS  AND  QUADRUPEDS 

Michael  A.  Coheiif,  Stephen  CJrossbergJ,  and  Christopher  Pribe§ 

Technical  Report  CAS/CNS-TR-93-004,  Boston  University 
Submitted  to  Journal  of  Neurophysiology 


Abstract 

1)  A  family  of  central  pattern  generators,  called  GO  (iait  Generators,  is  described  in  which  both  the 
frequency  and  the  relative  phase  of  oscillations  are  controlled  by  a  scalar  arousal  or  (^O  signal  that  instantiates 
the  will  to  act.  The  model  cells  obey  shunting  membrane  equations,  and  interact  via  fast  excitatory  feedback 
signals  and  slow  inhibitory  feedback  signals,  orga’  ized  as  an  on- center  oflf-surround  anatomy. 

2)  With  two  excitatory  cells,  or  cell  populations,  the  model  describes  an  opponent  processing  network 
in  which  both  in-phase  and  anti-phase  oscillations  can  occur  at  different  arousal  levels.  This  two-channel 
oscillator  can  also  produce  phase  transitions  from  either  in-phase  to  anti-phase  oscillations,  or  anti-phase  to 
in-phase  oscillations,  in  different  parameter  ranges,  as  the  GO  signal  increases. 

3)  The  two-channel  oscillator  is  used  to  simulate  data  from  human  bimanual  finger  coordination  tasks 
in  which  anti-phase  oscillations  at  low  frequencies  spontaneously  switch  to  in-phase  oscillations  at  high 
frequencies,  in-phase  oscillations  can  be  performed  both  at  low  and  high  frequencies,  phase  fluctuations  occur 
at  the  anti-phase  in-phase  transition,  and  a  “seagull  effect”  of  larger  errors  occurs  at  intermediate  phases. 
When  driven  by  environmental  patterns  with  intermediate  phase  relationships,  the  model’s  output  exhibits 
a  tendency  to  slip  toward  purely  in-phase  and  anti-phase  relationships  as  observed  in  humans  subject? 

4)  A  four-channel  oscillator  is  used  to  simulate  quadruped  vertebrate  gaits,  including  the  amble,  the 
walk,  all  three  pairwise  gaits  (trot,  pace,  and  gallop),  and  the  pronk.  Spatial  or  temporal  asymmetries  in 
oscillator  activation  by  the  (iO  signal  can  trigger  these  transitions.  Rapid  transitions  are  simulated  in  the 
order — walk,  trot,  pace,  and  gallop — that  occurs  in  the  cat. 

5)  This  precise  switching  control  is  achieved  by  using  GO-dependent  modulation  of  the  model’s  inhibitory 
interactions  that  generates  a  different  functional  connectivity  in  a  single  network  at  different  arousal  levels. 
Such  task-specific  modulation  of  functional  connectivity  in  neural  pattern  generators  has  been  experimentally 
reported  in  invertebrates.  A  role  for  such  a  mechanism  in  gait-switching  is  predicted  to  occur  in  vertebrates. 

6)  A  four  channel  oscillator  can  generate  the  two  standard  human  gaits:  the  walk  and  the  run.  Although 
these  two  gaits  are  qualitatively  different,  they  both  have  the  same  limb  order  and  may  exhibit  oscillation 
frequencies  that  overlap.  The  model  simulates  the  walk  and  the  run  via  qualitatively  different  waveform 
snapes.  The  fraction  of  cycle  that  activity  is  above  threshold  quantitai  ively  distinguishes  the  two  gaits, 
much  as  the  duty  cycles  of  the  feet  are  longer  in  the  walk  than  in  the  run. 


t  .Supported  in  part  the  Air  Force  Office  of  Scientific  Research  (AFOSR  90-0128  and 
AFOSR  F49620-92-.]-0225). 

t  Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  90-017.'')  and 
AFOSR  F49620-92-.J-022.'')),  the  National  Science  Foundation  (NSF  IRl-90-24877),  and  the 
Office  of  Naval  Research  (ONR  N00014-92-.I-1309). 

§  Supported  in  part  by  the  Army  Research  Office  (ARO  I)AAL03-88-K-0088),  the  Ad¬ 
vanced  Research  Projects  Agency  (AFOSR  90-0083),  the  National  Science  Foundation  (NSF 
IRI-90-24877),  and  the  Office  of  Naval  Re.search  (ONR  N00014-92-.)-1309). 


10 


A  NEURAL  NETWORK  SPECTRAL  MODEL 
OF  PITCH  DETECTION  AND  REPRESENTATION 

Michael  A.  Cohenf,  Stephen  GrossbergJ,  and  Lonce  Wyse* 

Submitted  to  Journal  of  the  Acoustical  Society  of  America 


Abstract 

A  neural  network  model  of  pitch  perception,  called  the  Spatial  Pitch  Network  or  SPINET 
model,  is  developed  and  analysed.  The  model  neurally  instantiates  ideas  from  the  spectral 
pitch  modeling  literature  and  joins  them  to  basic  neural  network  signal  processing  designs  to 
simulate  a  broader  range  of  perceptual  pitch  data  than  previous  spectral  models.  The  com¬ 
ponents  of  the  model  are  interpreted  as  peripheral  mechanical  and  neural  processing  stages, 
which  are  capable  of  being  incorporated  into  a  larger  network  architecture  for  segmenting 
multiple  sound  sources  in  the  environment. 

The  core  of  the  new  model  transforms  a  spectral  representation  of  an  acout  ic  source 
into  a  spatial  distribution  of  pitch  strengths.  The  SPINET  model  uses  a  weighted  “harmonic 
sieve”  whereby  the  strength  of  activation  of  a  given  pitch  depends  upon  a  weighted  sum  of 
narrow  regions  around  the  harmonics  of  the  nominal  pitch  value,  and  higher  harmonics 
contribute  less  to  a  pitch  than  lower  ones.  Suitably  chosen  harmonic  weighting  functions 
enable  computer  simulations  of  pitch  perception  data  involving  mistuned  components,  shifted 
harmonics,  and  various  tvpes  of  continuous  spectra  including  rippled  noise.  It  is  shown  how 
the  weighting  functions  produce  the  dominance  region,  how  they  lead  to  octave  shifts  of 
pitch  in  response  to  ambiguous  stimuli,  and  how  they  lead  to  a  pitch  region  in  response 
to  the  octave-spaced  “Shepard”  tone  complexes  without  the  use  of  attentional  mechanisms 
to  Mmit  pitch  choices.  An  on-center  off-surround  network  in  the  model  helps  to  produce 
noise  suppression,  partial  masking,  and  edge  pitch.  A  method  is  described  for  relating  the 
model’s  pitch  activatioi  functional  to  statistical  human  performance  and  for  comparing  the 
network  model  with  Goldstein’s  statistical  Optimum  Processor  Theory.  Finally,  it  is  shown 
how  peripheral  filtering  and  short  term  energy  measurements  produce  a  model  pitch  estimate 
that  is  sensitive  to  certain  component  phase  relationships. 


t  Supported  in  part  by  the  Air  Force  Office  f  Scientific  Research  (AFOSR  F49620-92-.)- 
0225). 

f  Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  F49620-92-.)- 
0225),  ARPA  (ONR  N00014-92-.J-4015),  and  the  Office  of  Naval  Research  (ONR  N00014-91- 
.1-4100). 

Supported  in  part  by  the  American  Society  for  Engineering  Education,  and  the  Air  Force 
Office  of  Scientific  Research  (AFOSR  F49620-92-.J-0225). 


II 


STATISTICAL  PROPERTIES  OF  SINGLE  AND  COMPETING 
NONLINEAR  FAST-SLOW  OSCILLATORS  IN  NOISE 

Stephen  Crossbergf  and  Alexander  Crunewaldj 

Technical  Report  CAS/CNS-TR-93-022,  Boston  University 
In  Proceedings  of  the  World  Congress  on  Neural  Networks 
Hillsdale,  NJ:  Erlbaum  Associates,  1993,  IV,  303-307 


Abstract 

Statistical  properties  of  fast-slow  Ellias-Grossberg  oscillators  are  studied  in  response  to 
deterministic  and  noisy  inputs.  Oscillatory  responses  remain  stable  in  noise  due  to  the 
slow  inhibitory  variable,  which  establishes  an  adaptation  level  that  centers  the  oscillatory 
responses  of  the  fast  excitatory  variable  to  deterministic  and  noisy  inputs.  Competitive 
interactions  between  oscillators  improve  the  stability  in  noise.  Although  individual  oscil¬ 
lation  amplitudes  decrease  with  input  amplitude,  the  average  total  activity  increases  with 
input  amplitude,  thereby  suggesting  that  oscillator  output  is  evaluated  by  a  slow  process  at 
downstream  network  sites. 


t  Sui)ported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  F49(i20-92- 
.1-0225),  ARPA  (ONR  N00014-92-.J-4015),  the  National  Science  Foundation  (NSF  lRI-90- 
24877),  and  the  Office  of  Naval  Research  (ONR  N00014-91-.J-4100). 

t  Supported  in  part  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  F49620-92-.J- 
0225). 


12 


